Tags: PMM alert alarmHere is how to increase the alarm function, saying that no alarm function monitoring is not good monitoring!PMM monitoring type mainly has, actually should say is Grafana
e-mail #最常用的, but the disadvantage is that the service provider's SMTP server, often have a delay, be treated as spamThe Official Configuration tutorial:Https://www.percona.com/blog/2017/01/23/mysql-and-mongodb-alerting-with-pmm-and-grafana/?utm_source=tuicool utm_medium=referral
Webhook #不考虑了,
solutions. The topic is a bit running, and now the scheme of trimming is to trim the alarm of some emails and text messages while I am developing the alarm platform Phase II. The scheme of trimming is within 3 minutes, if duplicate information is not sent, you can define the trim time for each information type on the platform.
Some serious and disaster-level alarms need to be contacted by NOC. At this time, we can use some voice alarm API interfaces to solve such problems. Most of them use twil
cmdb.
Now, RegEx is fully tested for Version 2.2, 2.x, and subsequent migration solutions. The topic is a bit running, and now the scheme of trimming is to trim the alarm of some emails and text messages while I am developing the alarm platform Phase II. The scheme of trimming is within 3 minutes, if duplicate information is not sent, you can define the trim time for each information type on the platform.
Some serious and disaster-level alarms need to be contacted by noc. At this time, we can u
Zookeeper Vulnerability Analysis
For those who do not know ZooKeeper, it is a well-known open-source project that supports highly Reliable Distributed Coordination. It is trusted by many security companies around the world, including PagerDuty. It provides highly available and linear services based on the leader's philosophy, and these services can be dynamically reselected by most arbitration to ensure service consistency.
The leadership election and
alarm notification sent frequency, and so on, Grafana can not meet, for this alarm rule we have implemented an alarm engine, To satisfy these more complex alarm rules.
NotificationGrafana Alarm notification is only triggered when the state transitions, that is, the alarm status will send an alarm notification, if the condition has been satisfied for a period of time before the recovery of the alarm conditions, Grafana will not always send notifications until the time of recovery to send a reco
go write, Telegraf CPU occupies a lower level (0.4-5%). Rich in functionality, while supporting the external process and container data collection, up to 55 kinds of data source plug-in, there is no need for cloth cadvisor, personal comparison recommended. Students who need to be alerted may consider changing the influxdb to Prometheus. It contains alertmanager to implement email, pagerduty and other message notifications. The data backend can choose
expressions are provided to ease the test.
Use the command line interface to create and execute tests.
Allow all information to be stored-API calls and project data to an independent space.
Http://www.httpmaster.net/
9. Runscope
Runscope is a simple tool used to test and monitor API performance. It can help you verify whether the web service or API returns the correct data, and give a prompt when the API fails. Runscope also supports API and mobile app backend service testing.
Allows you t
team's chat room and send a text message to the duty engineer. In "Say What's happening", we'll name the monitor and add a short message accompanying that notification, suggesting that you start the investigation first. We will use the Slack at the OPS team and @pagerduty send a warning to SMS.NGINX Metric Notification
Save Integrated Monitoring. Click the "Save" button at the bottom of the page. You are now monitoring a key NGINX job indicator an
. If NGINX requests decrease, we want to notify our team. In this example, we will send a notification to the ops team's chat room and send a text message to the engineer on duty. In "Say what's happening", we will name the monitor and add a short message with the notification. we recommend that you start the investigation first. We will @ The Slack used by the ops team and @ pagerduty will send a warning to the text message.NGINX metric notification
-Mongodb_database-Mount-Network-Openstack_config-Pagerduty-Pip-Pkg-Pkgng-Pkgrepo-PowerPath-Pyenv-Quota-raid-Rbenv-Redis-RVM-Salt-Schedule-Serverdensity_device-Service-Slack-SMTP-Ssh_auth-Ssh_known_hosts-Stateconf-Status-Supervisord-Sysctl-Syslog_ng-Test-TimeZone-TLS-Tomcat-User-Vbox_guest-Virtualenv-Webutil-WinrepoView all function of the specified statesView all function of file.statesSalt ' minion1 ' sys.list_state_functions fileMinion1:-File.absent
, visualize
Elasticsearch-a Lucene-based document store that is used primarily for log indexing, storage, and analysis.
FLUENTD-Log collection and issuance
Flume-Distributed Log collection and aggregation system
GRAYLOG2-Pluggable log and event Analysis server with alarm options
Heka-Stream processing system, which can be used for log aggregation
Kibana-Visualizing log and timestamp data
Logstash-Tools for managing events and logs
Octopussy-Log management sol
this task to gearman, a task queue system. Asynchronous execution through the task queue means that the media upload can be completed quickly (that is, sending an Instagram message quickly), and the "heavy load" can be run in the background. We have about 200 consumers (all written in Python) who consume tasks in the queue. Our feed fan-out also uses gearman, so that posting will respond to new users because there are many followers.
For message pushing, the most cost-effective solution we have
There are many tools and services for monitoring servers and VPS. For example, open-source tools include Nagios, Cacti, Zabbix, Zenoss, Ganglia ,... If you do not want to host these monitoring software, you can consider outsourcing to third-party services, such as Pingdom, ServerDensity, ScoutApp, and PagerDuty. If you don't have many requirements, just want to monitor the website, rather than the performance metrics of the entire server, you can cons
want.Check_command This is the most critical of all, define this service monitor specifically to monitor what content to tune what monitoring script to take (later I will talk about script customization).notifications_enabled :Whether to turn on the alert function. 1 is on, 0 is disabled. That is, whether to start the alarm. That is, whether to turn on the alarm to monitor this service (I'll talk about the alarm pagerduty later).Next we do a centos_6
notify our team. In this example, we will send a notification to the ops team's chat room and send a text message to the engineer on duty. In "Say what's happening", we will name the monitor and add a short message with the notification. We recommend that you start the investigation first. We will @ the Slack used by the ops team and @ pagerduty will send a warning to the text message.NGINX metric notification
Save Integrated Monitoring. Click "Save
and Disadvantages of pushgateway
20. Explain the Exporter Code of Prometheus
21. grafana introduction graph of Prometheus
22. Details on grafana graphic settings of Prometheus
23. Prometheus's grafana alerting alarm-1
24. Alert alert for grafana of Prometheus-2
25. Prometheus's grafana alert-3
26. Prometheus enterprise practice CPU monitoring
27. Prometheus enterprise practice memory monitoring
28. Prometheus's business practice usage Estimation Function
29. Prometheus enterprise practice I/O/n
library uses pylibmc and libmemcached. Amazon also provides the caching service-elastic cache service. Instagram also has some attempts, but it is not cheap.
Task queue/release notification
The queue service uses gearman and the notification system uses pyapns.
Monitoring
The number of server instances mentioned above adds up to more than 100, and effective monitoring is quite necessary. Using Munin as the main monitoring tool, you have also written many custom plug-ins, and pingdom ser
there is no time to deal with the alarm. The Manpower department does not give the force, the delay cannot find the person.Fortunately, there are some alarm aggregation services abroad, pagerduty, Bigpanda and so on.The main function of this kind of tool is to realize the alarm of all monitoring system in one platform, so as to realize the service of alarming aggregation, so that OPS personnel can concentrate on it event, avoid multi-platform switch
The content source of this page is from Internet, which doesn't represent Alibaba Cloud's opinion;
products and services mentioned on that page don't have any relationship with Alibaba Cloud. If the
content of the page makes you feel confusing, please write us an email, we will handle the problem
within 5 days after receiving your email.
If you find any instances of plagiarism from the community, please send an email to:
info-contact@alibabacloud.com
and provide relevant evidence. A staff member will contact you within 5 working days.